-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sheet_utils refactor to add csv functionality (C4-1088) #276
base: kmp_sheet_utils
Are you sure you want to change the base?
Conversation
…er_agent, for example
… not the workbook level artifact. Better handling of init args.
… ItemManager.load to take a tab_name argument so that CSV files can perhaps infer a type name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the organization here, and the consideration for csv files which can easily be adapted for tsv files.
dcicutils/sheet_utils.py
Outdated
class AbstractTableSetManager: | ||
""" | ||
The TableSetManager is the spanning class of anything that wants to be able to load a table set, | ||
regardless of what it wants to load it from. To do this, it must support a load method | ||
that takes a filename and returns the file content in the form: | ||
{ | ||
"Sheet1": [ | ||
{...representation of row1 as some kind of dict...}, | ||
{...representation of row2 as some kind of dict...} | ||
], | ||
"Sheet2": [...], | ||
..., | ||
} | ||
Note that at this level of abstraction, we take no position on what form of representation is used | ||
for the rows, as long as it is JSON data of some kind. It might be | ||
{"col1": "val1", "col2", "val2", ...} | ||
or it might be something more structured like | ||
{"something": "val1", {"something_else": ["val2"]}} | ||
Additionally, the values stored might be altered as well. In particular, the most likely alteration | ||
is to turn "123" to 123 or "" to None, though the specifics of whether and how such transformations | ||
happen is not constrained by this class. | ||
""" | ||
|
||
def __init__(self, **kwargs): | ||
if kwargs: | ||
raise ValueError(f"Got unexpected keywords: {kwargs}") | ||
|
||
@classmethod | ||
def load_workbook(cls, filename: str): | ||
wb = cls(filename) | ||
return wb.load_content() | ||
def load(cls, filename: str) -> Dict[str, List[AnyJsonData]]: | ||
""" | ||
Reads a filename and returns a dictionary that maps sheet names to rows of dictionary data. | ||
For more information, see documentation of AbstractTableSetManager. | ||
""" | ||
raise NotImplementedError(f".load(...) is not implemented for {cls.__name__}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should be an abstract base class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I was quite clean enough. I know of such classes from Lisp, but hadn't been tracking Python's adoption of them. As I read the doc, having a method that does something isn't helpful because it's not in the MRO. Here I'm relying on __init__(self, **kwargs)
for some error handling that must be in the MRO.
I also gotta say that although I do find it useful to name these things in the way that says yeah, it's abstract, I have not found metaclasses to be as useful as I expected them to be. Almost every time I've found a use for metaclasses, there's a just as easy way to make it with regular classes. (Actually, in Python I often use decorators.) Part of the issue is that the uses invariably thwart static analysis, so I end up finding static analysis more important than the other stuff. Maybe such tools would understand this class as a special case because it might have widespread use, but in general metaclasses just make it hard to understand. We have code in dcicutils.misc_utills
for example that imlements classproperty
but PyCharm fusses at me because it doesn't understand why I'm using cls
instead of self
as an argument.
So I'm gonna pass on that at this time. I'll add a note in the sources for future consideration. If you'd like to convince me otherwise, let's chat interactively. I'm not fixed on this. It's just my, uh, base posture.
def _raw_row_generator_for_tabname(self, tabname: str) -> Iterable[SheetRow]: | ||
""" | ||
Given a tabname and a state (returned by _sheet_loader_state), return a generator for a set of row values. | ||
""" | ||
raise NotImplementedError(f"._rows_for_tabname(...) is not implemented for {self.__class__.__name__}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also consider making this class an abstract base class with these methods decorated to be abstractmethod
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment.
Yeah, I can probably trivially add tsv files. I'll look at that. |
Co-authored-by: drio18 <[email protected]>
pyproject.toml
Outdated
@@ -1,6 +1,6 @@ | |||
[tool.poetry] | |||
name = "dcicutils" | |||
version = "7.7.2.1b0" # to become "7.8.0" | |||
version = "7.8.1.1b1" # to become "7.9.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it safe to update PyYML to ^6.0.1? I tried locally and tests pass, and I didn't see anything too scary in the release notes. It's just that everytime I build locally I have to manually update to this version due to Mac M1 issues with 5.4.1.
…fully' removed by an editor that doesn't understand such whitespace might be significant in TSVs.
This is an improvement on Some tools for working with spreadsheets (PR #275) to generalize it a bit more and add support for
.csv
files.From the
CHANGELOG.rst
...New module
sheet_utils
for loading workbooks.Important things of interest:
Class
ItemManager
for loading Item-style data from either.xlsx
or.csv
files.Function
load_items
that does the same asItemManager.load
.Various low-level implementation classes such as:
Classes
XlsxManager
andCsvManager
for loading raw data from.xlsx
and.csv
files, respectively.Classes
ItemXlsxManager
andItemCsvManager
for loading Item-style data from.xlsx
and.csv
files, respectively.Contains a fix for a bug in
ff_utils.get_schema_names
(C4-1086).